Improve custom transpilation for faster 1Q/2Q RB #922

itoko · 2022-09-27T05:51:21Z

Summary

This PR is a follow-up of #892 for further performance improvement with several clean-ups suggested in the review of PR 892.

Details and comments

The drastic performance improvement in 1Q/2Q RB (PR 892) was mostly come from two ideas:

Custom transpilation (mapping circuits to physical qubits without using transpile)
Integer-based Clifford operations (especially sparse lookup table with triplet decomposition for 2Q Clifford circuits)

PR 892 implements more than those two (including some not contributing to the performance improvement) and they make difficult to track what really improved the performance and seem overly reduce maintainability of the code.

This PR cleans up the implementation of Standard/InterleavedRB without performance regression and add further optimization in the implementation of (1) custom transpilation with following several suggestions given in PR 892. For insetances,

Change back to the simple code structure introduced at Refactor RB module for future extensions #898
Change back to return wrapped Clifford circuits (not unwrapped ones), such a change (wrapped -> unwrapped) should be discussed as an independent issue
Move StandardRB._layout_for_rb to clifford_utils._transpile_clifford_circuit and add further optimization.
(See also the original draft PR to roughly understand which code blocks are essentially changed for above updates)
Fix a bug that ignores user calibrations in custom transpilation of 1Q/2Q RB circuits

Other features added in this PR includes:

Add the same fix as in Fix performance regression in 3 or more qubits RB #905
Improve validation of interleave_element
Restructure tests

This PR achieves further speed-up (1.2x for 1Q StandardRB, 1.1x for 2Q StandardRB, 3.4x for 1Q InterleavedRB, 1.6x for 2Q InterleavedRB) measured by this benchmark.
This PR will be followed by one more PR that cleans up clifford_utils, which is left with minimal updates in this PR.

qiskit_experiments/library/randomized_benchmarking/rb_experiment.py

nkanazawa1989 · 2022-09-28T02:53:47Z

qiskit_experiments/library/randomized_benchmarking/rb_experiment.py

-
-    # The following methods are used for RB with more than 2 qubits
-    def _sample_circuits(self):
+        self._cliff_utils = CliffordUtils(


Can this be promoted to actual generator by including _sample_sequences and _sequences_to_circuits? I am implementing another RB variant, that partly uses Clifford sequence but final structure is totally different from standard RB (thus no inheritance). I don't want to implement boilerplate code in such a class.

I see your point, but let me go there (having RBCircuitGenerator class to delegate whole circuit generation) step by step. I'd like to focus on clean-up of Standard/InterleavedRB implementation in this PR, then clean-up of clifford_utils in the next PR (I'll remove the creation of CliffordUtils objects there). I prefer further refactoring should come after them.

nkanazawa1989 · 2022-09-28T02:59:04Z

qiskit_experiments/library/randomized_benchmarking/rb_experiment.py

-        if self.num_qubits in [1, 2]:
-            transpiled = self._layout_for_rb()
+        has_custom_transpile_option = (
+            any(


I realized current implementation induces a serious issue when benchmarking the user-calibrated gate. Since conventional sequence implicitly triggers the pulse gate pass in the default pass managers that pulls gate calibrations from the backend instmap that doesn't explicitly show up in the transpile option. This is indeed the breaking API change, because custom calibration will be just ignored but user will never notice this. I think you need to explicitly manage inst map without using pass manager (for performance reason). Alternatively, we can assume V2 backend, where calibration data is always provided with gate entry. Note that this PR will be merged soon. This makes me think we need unittest for missing calibrations.

Addressed in cee4e78

I prefer to have warning or completely remove transpile options, i.e overriding _set_transpile_options method to raise warning/error. Usually users are not aware of what is happening inside .run() and may set some random option from random example code. I think having inst_map only makes sense in this experiment.

Good suggestion. I like the idea restricting acceptable transpile options for each experiment. (too many possible transpile_options was one of my headaches.) For RB experiments, I don't like to but I'll accept basis_gates and optimization_level for now as well as inst_map. Because they are widely used in the tests (Fortunately, they seem not to be used in the tutorial). Forbidding them would require large refactoring of tests, so if we want that, I think it should be done in a separate PR.

…e structure

itoko · 2022-10-03T01:19:51Z

This PR is now ready to be reviewed.

merav-aharoni

I am submitting the comments I have up to here, so that you can have a look at them. I will continue to review, probably only on Thursday, because of our holiday.

qiskit_experiments/library/randomized_benchmarking/rb_experiment.py

qiskit_experiments/library/randomized_benchmarking/clifford_utils.py

nkanazawa1989

Thanks @itoko san. Looks almost good to me. Mainly nitpicks.

qiskit_experiments/library/randomized_benchmarking/rb_experiment.py

nkanazawa1989 · 2022-10-04T19:29:21Z

qiskit_experiments/library/randomized_benchmarking/rb_experiment.py

            transpiled = super()._transpiled_circuits()
-        if self.analysis.options.get("gate_error_ratio", None) is None:
+        else:
+            transpiled = [


Do you think worth having multi-threading here? Note that the standard Qiskit transpiler uses qiskit parallel and this is single thread.

Yeah, I think it's worth investigating.

I've examined parallel transpilation using qiskit.tools.parallel.parallel_map and found that it's much slower than serial one. I guess that is due to the overhead of pickling circuits. Let me keep the current code as is.

Examined code for parallel transpilation:

circuits = self.circuits() if ( len(circuits) > 1 and os.getenv("QISKIT_IN_PARALLEL", "FALSE") == "FALSE" and parallel.PARALLEL_DEFAULT ): # Custom transpilation in parallel transpiled = parallel.parallel_map( _transpile_clifford_circuit, circuits, task_kwargs={"physical_qubits": self.physical_qubits}, )

This is good to know.

Or maybe because I'm using Mac (I somewhere heard parallel does multi-processing so slow in Mac)

On mac by default parallel_map won't actually run in parallel on Python >= 3.8 (https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/tools/parallel.py#L61-L83 defines the defaults) because in Python 3.8 they switched macOS multiprocessing to use spawn instead of fork by default because of some platform bugs. Spawn has a lot of extra overhead and it's also unreliable for user code because you need a if __name__ == "__main__" guard many times to distinguish a child process from the parent so we disable multiprocessing by default on macOS.

In general multiprocessing in python can be very tricky to work with and get a performance boost. Although for transpile() with linux (so using fork()) with involved compilation the performance is typically better even with the pickling and IPC overhead of separate processes (with some caveats Qiskit/qiskit#7741 )

qiskit_experiments/library/randomized_benchmarking/rb_experiment.py

qiskit_experiments/library/randomized_benchmarking/interleaved_rb_experiment.py

nkanazawa1989 · 2022-10-04T19:35:46Z

qiskit_experiments/library/randomized_benchmarking/interleaved_rb_experiment.py

+            interleaved_circ.name = f"Clifford-{interleaved_circ.name}"
+            # Transpile circuit with non-basis gates and remove idling qubits
+            try:
+                interleaved_circ = transpile(


Do you need full transpile here? Likely you just need to call basis translator.

Good point. I think it was for delays but it may be unnecessary now since I updated the handling of delays in 36193c9
Let me check if we can replace transpile with basis translation.

I found it's possible but it would be difficult to explain the spec in doc (since we'd need to mention to a protected method _transform_clifford_circuit). Now I think we should revisit here when we redesign the basis translation scheme (removing the use of transpile for 1Q/2Q RB) in the future.

qiskit_experiments/library/randomized_benchmarking/clifford_utils.py

…nt.py Co-authored-by: Naoki Kanazawa <[email protected]>

merav-aharoni

I completed the review. Overall, I think the code looks very nice. I also like the split you made in test_randomized_benchmarking.

qiskit_experiments/library/randomized_benchmarking/clifford_utils.py

qiskit_experiments/library/randomized_benchmarking/interleaved_rb_experiment.py

test/library/randomized_benchmarking/test_randomized_benchmarking.py

merav-aharoni · 2022-10-11T13:12:38Z

Regarding the optimization_level, when I change it to be 0 in all the internal calls to transpile, I get the following error in many of the tests:

qiskit.exceptions.QiskitError: 'wrong param 6.28318530717959 for rz in clifford'

I am not sure exactly where this comes from, but it seems like level 0 does not do a good enough job at transpiling the Cliffords.

itoko · 2022-10-12T01:34:09Z

Benchmarking with https://gist.github.com/itoko/f078218ea6458f32d1c0f9be827f614f

main: Main branch (0359f4c)
prev: Before this PR (0b15ccb)
this: After this PR (2052eac)

method = time_transpiled_circuits for both settings

Standard RB 1Q:

backend	max_length	num_samples	main[s]	prev[s]	this[s]	prev->this
fake_manila	1000	3	6.858569	0.851461	0.817444	0.960049
fake_manila	1000	6	12.919110	1.660500	1.478235	0.890235
fake_manila	2000	3	12.545703	1.642936	1.431839	0.871513
fake_manila	2000	6	25.421245	3.265286	2.816278	0.862491
fake_mumbai	1000	3	6.696955	0.868488	0.767947	0.884235
fake_mumbai	1000	6	13.016662	1.817998	1.430544	0.786879
fake_mumbai	2000	3	12.516374	1.767419	1.416315	0.801346
fake_mumbai	2000	6	24.813701	3.380879	2.778376	0.821791
fake_washington	1000	3	10.513716	0.965256	0.795942	0.824592
fake_washington	1000	6	13.176188	1.944821	1.458366	0.749872
fake_washington	2000	3	12.700285	1.831261	1.451279	0.792503
fake_washington	2000	6	25.937748	3.700574	2.786020	0.752862

Standard RB 2Q:

backend	max_length	num_samples	main[s]	prev[s]	this[s]	prev->this
fake_manila	100	3	2.174022	0.373098	0.565202	1.514887
fake_manila	100	6	3.476085	0.743208	1.428178	1.921639
fake_manila	200	3	3.023454	0.726553	0.657691	0.905221
fake_manila	200	6	6.317040	1.353940	0.987601	0.729428
fake_mumbai	100	3	2.308012	0.362461	0.500592	1.381091
fake_mumbai	100	6	3.734278	0.707195	0.652423	0.922552
fake_mumbai	200	3	3.167860	0.684783	0.625557	0.913510
fake_mumbai	200	6	6.480515	1.380438	0.975742	0.706835
fake_washington	100	3	6.012488	0.407936	0.482093	1.181785
fake_washington	100	6	3.716712	0.836933	0.638941	0.763431
fake_washington	200	3	3.445951	0.733632	0.680576	0.927680
fake_washington	200	6	6.524212	1.499667	1.030386	0.687077

Interleaved RB 1Q:

backend	max_length	num_samples	main[s]	prev[s]	this[s]	prev->this
fake_manila	1000	3	18.058308	4.673694	1.485711	0.317888
fake_manila	1000	6	35.537921	9.215669	2.770748	0.300656
fake_manila	2000	3	35.208998	9.280712	2.657422	0.286338
fake_manila	2000	6	71.094951	18.541894	5.288353	0.285211
fake_mumbai	1000	3	17.977917	4.591728	1.393617	0.303506
fake_mumbai	1000	6	35.188215	9.284733	2.724183	0.293405
fake_mumbai	2000	3	34.964976	8.912482	2.702437	0.303219
fake_mumbai	2000	6	71.040783	18.087106	5.337821	0.295117
fake_washington	1000	3	22.380718	5.125869	1.415023	0.276055
fake_washington	1000	6	36.875958	10.234201	2.752939	0.268994
fake_washington	2000	3	36.186682	10.308951	2.701694	0.262073
fake_washington	2000	6	71.671835	19.716134	5.364398	0.272082

Interleaved RB 2Q:

backend	max_length	num_samples	main[s]	prev[s]	this[s]	prev->this
fake_manila	100	3	3.804730	0.663383	0.596649	0.899403
fake_manila	100	6	7.130793	1.335523	0.860453	0.644282
fake_manila	200	3	6.238839	1.277523	0.833921	0.652764
fake_manila	200	6	12.598263	2.683764	1.338446	0.498720
fake_mumbai	100	3	3.748988	0.694320	0.561316	0.808441
fake_mumbai	100	6	7.109178	1.358674	0.890352	0.655310
fake_mumbai	200	3	6.365602	1.334885	0.861731	0.645547
fake_mumbai	200	6	12.608916	2.633852	1.461120	0.554746
fake_washington	100	3	7.953887	0.819290	0.599565	0.731811
fake_washington	100	6	7.819094	1.608669	0.925331	0.575215
fake_washington	200	3	6.671117	1.493850	0.825585	0.552656
fake_washington	200	6	13.213095	2.938669	1.373048	0.467235

nkanazawa1989

Approved with minor suggestions. They don't block merging.

qiskit_experiments/library/randomized_benchmarking/interleaved_rb_experiment.py

nkanazawa1989 · 2022-10-12T05:04:57Z

qiskit_experiments/library/randomized_benchmarking/rb_experiment.py

            circ = QuantumCircuit(self.num_qubits)
-            circ.barrier(qubits)
+            circ.append(Barrier(self.num_qubits), circ.qubits)


This comment is not addressed. Of course you can do this in the follow-up.

* Improve custom transpilation for faster 1Q/2Q RB with simplifying code structure * Avoid using general Gate objects * Fix handling of interleaved delays * Add custom calibrations support in transpiling circuits * Add more comments * Small cleanups * More precise retrieve of basis gates * Update qiskit_experiments/library/randomized_benchmarking/rb_experiment.py Co-authored-by: Naoki Kanazawa <[email protected]> * Renames and updates following review comments * Remove experiment_type from circuit metadata * Add backend V1 to V2 conversion * Fix docs following review comments * Separate a code block as a private method * Change to create BackendTiming object outside loop Co-authored-by: Naoki Kanazawa <[email protected]>

Improve the performance of transpiled circuit generation in 1Q/2Q StandardRB/InterleavedRB (about 10x speedup in 1Q SRB/IRB and 5x speedup in 2Q SRB/IRB in most cases). Including following feature pull-requests: * New algorithm for RB building Cliffords by layers (#892) * Improve custom transpilation for faster 1Q/2Q RB (#922) * Improve integer-based Clifford operations for 1Q/2Q RB (#940) * Readd support for backends with directed basis gates to RB experiments (#960) Features other than speedup: * Fix performance regression in 3 or more qubits RB (embedded at Refactor RB module for future extensions #898) * Improve validation of interleave_element in InterleavedRB * Restructure tests for RB module * Remove the barriers in front of the first Cliffords in RB circuits Co-authored-by: merav-aharoni <[email protected]> Co-authored-by: Naoki Kanazawa <[email protected]>

nkanazawa1989 reviewed Sep 28, 2022

View reviewed changes

itoko force-pushed the feature-rb-custom-transpile branch from 7f2396f to 0eac7f1 Compare September 28, 2022 10:47

nkanazawa1989 force-pushed the feature/rb_speedup branch from 55604f7 to d34bc12 Compare September 29, 2022 03:57

itoko force-pushed the feature-rb-custom-transpile branch from 0eac7f1 to 492fa1f Compare September 29, 2022 06:31

Improve custom transpilation for faster 1Q/2Q RB with simplifying cod…

36b705f

…e structure

itoko force-pushed the feature-rb-custom-transpile branch from 492fa1f to 36b705f Compare September 29, 2022 06:34

itoko added 4 commits September 29, 2022 15:52

Avoid using general Gate objects

2a488db

Fix handling of interleaved delays

36193c9

Add custom calibrations support in transpiling circuits

cee4e78

Add more comments

4e74c93

itoko marked this pull request as ready for review September 30, 2022 11:00

itoko added 2 commits October 3, 2022 09:55

Small cleanups

e905d98

More precise retrieve of basis gates

2a6ead4

merav-aharoni suggested changes Oct 3, 2022

View reviewed changes

nkanazawa1989 reviewed Oct 4, 2022

View reviewed changes

itoko and others added 4 commits October 5, 2022 15:11

Update qiskit_experiments/library/randomized_benchmarking/rb_experime…

cba264e

…nt.py Co-authored-by: Naoki Kanazawa <[email protected]>

Renames and updates following review comments

25b09a9

Remove experiment_type from circuit metadata

7191bb7

Add backend V1 to V2 conversion

2054a11

merav-aharoni reviewed Oct 6, 2022

View reviewed changes

itoko added 2 commits October 7, 2022 14:47

Fix docs following review comments

25919f8

Separate a code block as a private method

2052eac

itoko mentioned this pull request Oct 7, 2022

Improve integer-based Clifford operations for 1Q/2Q RB itoko/qiskit-experiments#2

Closed

merav-aharoni approved these changes Oct 11, 2022

View reviewed changes

nkanazawa1989 approved these changes Oct 12, 2022

View reviewed changes

Change to create BackendTiming object outside loop

9142df4

itoko force-pushed the feature-rb-custom-transpile branch from de2de95 to 9142df4 Compare October 12, 2022 08:19

itoko merged commit 7dfdcbb into qiskit-community:feature/rb_speedup Oct 12, 2022

This was referenced Oct 12, 2022

Custom transpilation for faster 1Q/2Q RB #906

Closed

Improve integer-based Clifford operations for 1Q/2Q RB #940

Merged

This was referenced Oct 27, 2022

[WIP] Readd support for directed basis gates to RB experiments #957

Closed

Readd support for backends with directed basis gates to RB experiments #960

Merged

itoko mentioned this pull request Nov 29, 2022

Speed up 1Q/2Q RB #982

Merged

itoko mentioned this pull request Jan 19, 2023

Fix performance regression in 3 or more qubits RB #905

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve custom transpilation for faster 1Q/2Q RB #922

Improve custom transpilation for faster 1Q/2Q RB #922

itoko commented Sep 27, 2022 •

edited

Loading

nkanazawa1989 Sep 28, 2022

itoko Sep 28, 2022

nkanazawa1989 Sep 28, 2022

itoko Sep 30, 2022

nkanazawa1989 Oct 4, 2022

itoko Oct 5, 2022

itoko commented Oct 3, 2022

merav-aharoni left a comment

nkanazawa1989 left a comment

nkanazawa1989 Oct 4, 2022

itoko Oct 5, 2022

itoko Oct 5, 2022

nkanazawa1989 Oct 12, 2022

itoko Oct 14, 2022

mtreinish Oct 14, 2022

nkanazawa1989 Oct 4, 2022

itoko Oct 5, 2022

itoko Oct 7, 2022 •

edited

Loading

merav-aharoni left a comment

merav-aharoni commented Oct 11, 2022

itoko commented Oct 12, 2022 •

edited

Loading

nkanazawa1989 left a comment

nkanazawa1989 Oct 12, 2022

Improve custom transpilation for faster 1Q/2Q RB #922

Improve custom transpilation for faster 1Q/2Q RB #922

Conversation

itoko commented Sep 27, 2022 • edited Loading

Summary

Details and comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itoko commented Oct 3, 2022

merav-aharoni left a comment

Choose a reason for hiding this comment

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itoko Oct 7, 2022 • edited Loading

Choose a reason for hiding this comment

merav-aharoni left a comment

Choose a reason for hiding this comment

merav-aharoni commented Oct 11, 2022

itoko commented Oct 12, 2022 • edited Loading

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itoko commented Sep 27, 2022 •

edited

Loading

itoko Oct 7, 2022 •

edited

Loading

itoko commented Oct 12, 2022 •

edited

Loading